Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EndNote XML Exporter + Rehaul Importer #11157

Merged
merged 98 commits into from
Apr 17, 2024

Conversation

subhramit
Copy link
Collaborator

@subhramit subhramit commented Apr 7, 2024

Closes #11137
I am attempting to write an EndNote XML Exporter for JabRef.
I have successfully been able to implement the exporter, but there are a few catches, which will hopefully be addressed in follow-up issues and PRs:

  1. Journal/Booktitle values beginning with "IEEE" are not captured:
    Original Library:
    Screenshot 2024-04-07 145906
    On export and import:
    Screenshot 2024-04-07 150630

  2. The author/editor of the third entry changes from "Madrigal" to "Marigal and o S." (refer to the same pair of screenshots).

  3. @koppor @Siedlerchr @HoussemNasri @ThiloteE @calixtus The code is in a very raw stage right now, if you have time to go through it and suggest enhancements, please do.

Importer changes:
The importer had to be tweaked a little to include fields such as tertiary-title corresponding to the Book title/Journal, otherwise the column would remain empty:
Screenshot 2024-04-07 071355

Similarly, many other fields that were not retrieved by the importer as visible in BibTeX source were added to create a one-to-one mapping between the source entry and the one produced by a round trip export-import. The function convertRefNameToType was also modified to add some Entrytypes that were not recognized before.

Update: Major changes have taken place, please follow the discussion for details.

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@subhramit subhramit changed the title Fix for issue 11137 Add Endnote XML Exporter Apr 7, 2024
@subhramit subhramit marked this pull request as ready for review April 8, 2024 11:56
@Siedlerchr
Copy link
Member

In general looks good to me. As a follow up task and PR you can think about converting it to a StreamWriter (StaX).
StaX is more memory efficient. See e..g ModsExporter as an example

Siedlerchr
Siedlerchr previously approved these changes Apr 8, 2024
@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Apr 8, 2024
@subhramit
Copy link
Collaborator Author

In general looks good to me. As a follow up task and PR you can think about converting it to a StreamWriter (StaX). StaX is more memory efficient. See e..g ModsExporter as an example

Yes, in fact I had a couple of plans for the follow up PRs I'm going to put regarding this.

  1. Fixing the IEEE booktitle (point 1 of PR description) (might raise this as an issue once this is merged)
  2. Writing a test for this
  3. StaX as you suggested - will read up on this.

@subhramit
Copy link
Collaborator Author

subhramit commented Apr 8, 2024

@Siedlerchr On resolving the changelog conflict on Github itself, it's saying I "dismissed" your review. I'm sorry, that was accidental.

@Siedlerchr
Copy link
Member

Please do add the test here well. This makes the migration to stax easier Because you then have already a working test to verfiy the functionality.

@subhramit
Copy link
Collaborator Author

Please do add the test here well. This makes the migration to stax easier Because you then have already a working test to verfiy the functionality.

On it.

@subhramit
Copy link
Collaborator Author

Export & Import for alt-title added and tests adapted accordingly.

@@ -1,5 +1,6 @@
@article{,
address = {Cent States Forest Expt Stn, Columbus, OH USA},
alt-title = {Ecology},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improvement: if alt-title matches journal, then remove alt-title. Maybe, a cleanup at the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the same apply if booktitle matches alt-title?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have done that.

@@ -15,6 +16,7 @@ @article{
}
@book{,
address = {Univ Chicago, Chicago, IL USA},
alt-title = {Ecology},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(comment for below)

Here, I see journal set. This is wrong. If the entry is not an article, that field should go into booktitle or another field. Please try to check the table https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX and think on a case-by-case basis. As fallback, just use booktitle before loosing the field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original import test xml, it was present as alt-title.
Somewhere I had read during all this that booktitle in the bib is handled by secondary-title in the Endnote xml format. So I'll modify the bib and the xml, and change the StandardField of secondary-type to BOOKTITLE.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, have to undo this. It's importing as booktitle even for articles. Will try to do it the way you suggested.

@subhramit subhramit requested a review from koppor April 17, 2024 17:07
@koppor
Copy link
Member

koppor commented Apr 17, 2024

Review hint: The diff between two commits can be retrieved by URL hacking: 60f3ae2...0527ec7

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

}
case "urls" -> {
handleUrlList(reader, fields, linkedFiles);
case "periodical", "alt-periodical" -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, the order periodical and alt-periodical matters. Just leave out alt-periodical - or store it in the field alt-periodical (and keep it if it differs from periodical).

Copy link
Collaborator Author

@subhramit subhramit Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking every sub-field of alt-periodical felt tedious, and also on checking the Endnote dtd I see no alt-periodical. So I decided to leave out.
When I leave out alt-periodical, a journal field comes up in case of a Book. I might be missing something very obvious. Let us take a look at a test case:
image
It is as if it is ignoring the parent being unrecognized alt-periodical, and going in and setting the full-title into the journal field.
How do I prevent this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, found it! It was getting imported in the default case. Fixed it.

}
// Cleanup: Remove alt-title if it matches the journal
String journalOrBooktitle = entry.getField(StandardField.JOURNAL).or(() -> entry.getField(StandardField.BOOKTITLE)).orElse("");
if (entry.hasField(new UnknownField("alt-title"))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exctract new UnknownField("alt-title") to class constant FIELD_ALT_TITLE. You can use intellij refactoring extract field - Ctrl+Shift+F.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@subhramit subhramit requested a review from koppor April 17, 2024 19:07
@koppor koppor added this pull request to the merge queue Apr 17, 2024
Merged via the queue into JabRef:main with commit 7063d2d Apr 17, 2024
20 checks passed
@subhramit subhramit deleted the fix-for-issue-11137 branch April 20, 2024 19:05
@subhramit subhramit added type: enhancement type: feature import export / save and removed status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers labels Aug 16, 2024
@subhramit subhramit changed the title Add Endnote XML Exporter + Rehaul Importer Add EndNote XML Exporter + Rehaul Importer Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an exporter for Endnote XML
4 participants